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(54) Processor and method for speculatively executing a conditional branch instruction utilizing 
a selected one of multiple branch prediction methodologies 



(57) A processor (10) and method for speculatively 
executing branch instructions utilizing a selected branch 
prediction methodology are disclosed. The processor 
has one or more execution units (22, 28, 30) for execut- 
ing instructions, including a branch processing unit (13) 
for executing branch instructions. The branch process- 
ing unit includes selection logic for selecting one of a 
plurality of branch prediction methodologies and a 
branch prediction unit tor predicting the resolution of a 
conditional branch instruction utilizing the selected 



branch prediction methodology The branch processing 
unit further includes execution facilities for speculatively 
executing the conditional branch instruction based upon 
the prediction. Based upon the outcome of the predic- 
tion, the selection logic selects a branch prediction 
methodology for predicting a subsequent conditional 
branch instruction so that branch prediction accuracy is 
enhanced, In one embodiment, the multiple branch pre- 
diction methodologies include static and dynamic 
branch prediction. 
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Description 
BACKGROUND 

1. Technical Field: 

The technical field of the present specification re- 
lates in general to a method and system tor data 
processing and in particular to a processor and method 
for executing branch instructions. Still more particularly, 
the technical field relates to a processor and method for 
speculatively executing a conditional branch instruction 
utilizing a selected one of multiple branch prediction 
methodologies. 

2. Description of Xbe Related Art: 

A state-of-the-art superscalar processor typically 
includes an instruction cache for storing instructions, an 
instruction buffer for temporarily storing instructions 
fetched from the instruction cache for execution, one or 
more execution units for executing sequential instruc- 
tions, a branch processing unit (BPU) for executing 
branch instructions, a dispatch unit for dispatching se- 
quential instructions from the instruction buffer to partic- 
ular execution units, and a completion buffer for tempo- 
rarily storing sequential instructions that have finished 
execution, but have not completed. 

Branch instructions executed by the branch 
processing unit (BPU) of the superscalar processor can 
be classified as either conditional or unconditional 
branch instructions. Unconditional branch instructions 
are branch instructions that change the flow of program 
execution from a sequential path to a specified target 
execution path and which do not depend upon a condi- 
tion supplied by the execution ol another instruction. 
Thus, the branch specified by an unconditional branch 
instruction is always taken. In contrast, conditional 
branch instructions are branch instructions for which the 
indicated branch in program flow may be taken or not 
taken depending upon a condition supplied by the exe- 
cution of another instruction. Conditional branch instruc- 
tions can be further classified as either resolved or un- 
resolved, based upon whether or not the condition upon 
which the branch depends is available when the condi- 
tional branch instruction is evaluated by the branch 
processing unit (BPU). Because the condition upon 
which a resolved conditional branch instruction depends 
is known prior to execution, resolved conditional branch 
instructions can typically be executed and instructions 
within the target execution path fetched with little or no 
delay in the execution of sequential instructions. Unre- 
solved conditional branches, on the other hand, can cre- 
ate significant performance penalties if fetching of se- 
quential instructions is delayed until the condition upon 
which the branch depends becomes available and the 
branch is resolved. 

Therefore, in order to enhance performance, some 



processors speculatively execute unresolved branch in- 
structions by predicting whether or not the indicated 
branch will be taken. Utilizing the result of the prediction, 
the fetcher is then able to fetch instructions within the 
5 speculative execution path prior to the resolution of the 
branch, thereby avoiding a stall in the execution pipeline 
if the branch is resolved as correctly predicted. 

Processors having branch prediction facilities typi- 
cally employ either a static or dynamic branch prediction 
10 methodology. One of the simplest implementations' of 
static branch prediction is to guess all backward-going 
branches as taken and all forward-going branches as 
not taken. lrti^Qga'lter!riiativ.ePinnp;lem enlatlon~Q t!^:Slati 
/^ainefe^pred iGti0mv3§aeh^b patncKBImslr^ 
cgrarRiis^assiociated^itPaW^ 
l^r^titngloperalioMofi^tMewp'fio^^^ 
£2t0l:tratDiariGlji^proees^^ 

cbranGln:Sheuldi:b.G[-pnGGlicte d^asHaken^ Thus. based upon 
i nf_G. i;matiorfi^ Q I ean.edjjAfM4fa§r. pJ^g jjag^^ ringEcorinpi la - 
20 tion, tln'e^efrrvpiieT5dietatesE;Whethei£Qr^:niot5^ 
^vvitbb,ejp [iedictc^d^s:^taket3iLQ ignot?taken7if-e 
<i:ulativeiy. In contrast to the software-based approach uti- 
lized in static branch prediction, dynamic branch predic- 
tion records the resolution of particular branch instruc- 
ts tions within a Branch History Table (BHT) and utilizes 
the previous resolutions stored in the table to predict 
subsequent branches. 

Although the accuracy of both static branch predic- 
tion and dynamic branch prediction is fairly high, with 
30 static prediction averaging between 60%-70% accuracy 
and dynamic prediction averaging 90%-97% accuracy, 
< ^;tfee;:ejce,igs i^usgW^ili 
a0jegyasarf^r^(tMfui?s'Gvere3p5 
<^tDart i culari;in struct ionrs cer^ariGsr40Gau c:^For example, a 
35 static prediction methodology that predicts forward 
branches as not taken and backward branches as taken 
will result in 100% misprediction for instruction sequenc- 
es in which forward-going branches are always taken. 
Although this particular type of code sequence is ideal 
40 for dynamic branch prediction, static branch prediction 
is superior to dynamic branch prediction for other code 
sequences. For example, a branch contained within a 
loop that is taken on alternating count indices will be mis- 
predicted 100% of the time utilizing dynamic branch pre- 
•^5 diction, but only 50% of the time utilizing static branch 
pr ed ict ion .^lMa"ddiitwilt0.iU^ie2pr^^ 

^^rfjpj5j:t r;fij f;edjp:t! 

^^^gggl3f" pj;edi<^ti^.[^^ kcgpgjgg^^ 
fC:i!iracM©ll0vvih'g^awe:©:l^ 
so Aa<:k%okau-m,B\la}^ie^^^ 
^§§^A/fe'jj§i;iiSfiiMe^Gl,y.^^^^^^ 
fj§^^di6t!©rn5a^e'^iii*Si"Gy^ifsth^i5ip^ 
a s mal lst5,r;arnetnit!iist0Tsy4tat3le^s?is§t ptibTe 
<^g5bnainehBif?i^tilitiBPililiin§?^^^ 
55 To address the deficiencies inherent in each type of 
branch prediction, some processors provide a mode bit 
that enables the branch processing unit (BPU) to utilize 
either a static or dynamic branch prediction mechanism, 



2 



10/30/2003, EAST Version: 1.4.1 



3 



EP 0 805 390 A1 



4 



depending on the stale of the nnode bit. However, such 
processors do not permit the branch processing unit 
(BPU) to intelligently select which branch prediction 
methodology should be employed for a particular in- 
stance of a branch instruction. Consequently, a proces- 
sor and method for speculatively executing conditional 
branch instructions are needed which intelligently select 
one of multiple branch prediction methodologies. 

SUMMARY OF THE INVENTION 

In a first aspect, the present invention provides an 
improved processor and method for speculatively exe- 
cuting conditional branch instructions utilizing a select- 
ed one of multiple branch prediction methodologies. 

A processor and method according to preferred em- 
bodiments of the invention for speculatively executing 
branch instructions utilizing a selected branch prediction 
methodology are disclosed. The processor has one or 
more execution units for executing instructions, includ- 
ing a branch processing unit for executing branch in- 
structions. ^IfeM^g 5^:|ii4^^^#v^JEgi'^^^ t^^ 
Jt^rio.gi^stes.ele.e.tifsigaoniet^^^^^ 
^^ijtieinilel^.C®^^^^ 

^| ^%utiJi;zJn. g-U^T g.eJgjg,te.d; 

tgy^The branch processing unit further includes execu- 
tion facilities for speculatively executing the conditional 
branch instruction based upon the prediction.- Based up- 
on the outcome of the prediction, the selection logic pref- 
erably selects a branch prediction methodology for pre- 
dicting a subsequent conditional branch instruction so 
that branch prediction accuracy is enhancej j;nigj3:6-em- 
♦bogimentphe -itiu ftip le-bl'a h ch'^p'rec! iet ion m ethodolog i es 
iig^wdef statiCTili^pfann i c?branc h^prediction . 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention, as well as a preferred mode of use, 
and advantages thereof, will best be understood by ref- 
erence to the following detailed description of an illus- 
trative embodiment when read in conjunction with the 
accompanying drawings, wherein: 

Figure 1 illustrates a block diagram of an illustrative 
embodiment of a processor, which includes a 
branch processing unit; 

Figure 2 depicts a more detailed block diagram of 
the branch processing unit illustrated within Figure 
1; 

Figure 3 a pictorial representation of the branch his- 
tory table (BHT) of the branch processing unit 
(BPU) depicted within Figure 2; 

Figure 4 depicts a state diagram of the method em- 
ployed by the branch processing unit illustrated In 



Figure 2 to select a branch prediction methodology 
utilized to predict the resolution of a speculatively 
executed conditional branch instruction; and 

5 Figure 5 is a flowchart illustrating a method for ex- 
ecuting a branch instruction in accordance with the 
illustrative embodiment depicted within Figure 1. 

DETAILED DESCRIPTION 

10 

With reference now to the figures and in particular 
with reference to Figure 1 , there is depicted a block di- 
agram of an illustrative embodiment of a processor, in- 
dicated generally at 1 0, for processing information in ac- 

^5 cordance with the invention recited within the appended 
claims. In the depicted illustrative embodiment, proces- 
sor 10 comprises a single integrated circuit superscalar 
microprocessor. Accordingly, as discussed further be- 
low, processor 10 includes various execution units, reg- 

20 isters, buffers, memories, and other functional units, 
which are all formed by integrated circuitry. Processor 
10 preferably comprises one of the PowerPC™ line of 
microprocessors available from IBM fylicroelectronics, 
which operates according to reduced instruction set 

25 computing (RISC) techniques; however, those skilled in 
the art will appreciate that other suitable processors can 
be utilized. As illustrated in Figure 1, processor 10 is 
coupled to system bus 11 via a bus interface unit (BlU) 
12 within processor 10. BlU 12 controls the transfer of 

30 information between processor 10 and other devices 
coupled to system bus 11 , such as a main memory (not 
illustrated). Processor 1 0, system bus 1 1 , and the other 
devices coupled to system bus 11 together form aSSl^ 

J5^ BlU 12 is connected to instruction cache 14 and da- 
la cache 16 within processor 10. High-speed caches, 
such as instruction cache 1 4 and data cache 1 6, enable 
processor 10 to achieve relatively fast access time to a 
subset of data or instructions previously transferred 

40 from main memory to caches 14 and 16. thus improving 
the speed of operation of the data processing system. 
Instruction cache 14 is further coupled to sequential 
fetcher 1 7, which fetches instructions for execution from 
instruction cache 14 during each cycle. Sequential 

-^5 fetcher 1 7 transmits instructions fetched from Instruction 
cache 14 to both branch processing unit (BPU) 18 and 
instruction queue 19, which decode the instructions to 
determine whether the instructions are branch or se- 
quential instructions. Branch instructions are retained 

50 by BPU 18 for execution and cancelled from instruction 
queue 19; sequential Instructions, on the other hand, are 
cancelled from BPU 18 and stored within instruction 
queue 19 for subsequent execution by other execution 
circuitry within processor 10. 

55 In the depicted illustrative embodiment, in addition 
to BPU 18, the execution circuitry ot processor 10 com- 
prises multiple execution units for sequential instruc- 
tions, including fixed-point unit (FXU) 22, load/store unit 
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(LSU) 28, and floating-point unit (FPU) 30. As is well- 
known to those skilled in the computer arts, each ot ex- 
ecution units 22, 28, and 30 typically executes one or 
more instructions of a particular type of sequential in- 
structions during each processor cycle. For example. 
FXU 22 performs fixed-point mathematical and logical 
operations such as addition, subtraction, ANDing, OR- 
ing, and XORing, utilizing source operands received 
from specified general purpose registers (GPRs) 32 or 
GPR rename buffers 33. Following the execution of a 
fixed-point instruction, FXU 22 outputs the data results 
of the instruction to GPR rename buffers 33, which pro- 
vide temporary storage for the result data until the in- 
struction is completed by transferring the result data 
from GPR rename buffers 33 to one or more of GPRs 
32. Conversely, FPU 30 typically performs single and 
double-precision floating-point arithmetic and logical 
operations, such as floating-point multiplication and di- 
vision, on source operands received from floating-point 
registers (FPRs) 36 or FPR rename buffers 37, FPU 30 
outputs data resulting from the execution of floating- 
point instructions to selected FPR rename buffers 37, 
which temporarily store the result data until the instruc- 
tions are completed by transferring the result data from 
FPR rename buffers 37 to selected FPRs 36. As its 
name implies, LSU 28 typically executes floating-point 
and fixed-point instructions which either load data from 
memory (i.e., either data cache 16 or main memory) into 
selected GPRs 32 or FPRs 36 or which store data from 
a selected one ot GPRs 32, GPR rename buffers 33, 
FPRs 36, or FPR rename buffers 37 to memory 

Processor 10 employs both pipelining and out-of- 
order execution of instructions to further improve the 
performance of its superscalar architecture. According- 
ly, instructions can be executed opportunistically by 
FXU 22, LSU 28, and FPU 30 in any order as long as 
data dependencies are observed. In addition, instruc- 
tions are processed by each of FXU 22, LSU 28, and 
FPU 30 al a sequence of pipeline stages. As is typical 
of many high-performance processors, each instruction 
is processed at five distinct pipeline stages, namely 
fetch, decode/dispatch, execute, finish, andcpmpletion. 

During the fetch stage, sequential fetcher 17 re- 
trieves one or more instructions associated with one or 
more memory addresses from instruction cache 14. As 
noted above, sequential instructions fetched from in- 
struction cache 14 are stored by sequential fetcher 17 
within instruction queue 19, while branch instructions 
are removed (folded out) from the sequential instruction 
stream. As described below with reference to Figure 2, 
branch instructions are executed by BPU 18, which in- 
cludes a novel branch prediction mechanism that ena- 
bles BPU 18 to speculatively execute unresolved con- 
ditional' branch instructions utilizing a selected one of 
multiple branch prediction methodologies. 

During the decode/dispatch stage, dispatch unit 20 
decodes and dispatches one or more instructions from 
instruction queue 19 to execution units 22, 28, and 30. 



During the decode/dispatch stage, dispatch unit 20 also 
allocates a rename buffer within GPR rename buffers 
33 or FPR rename buffers 37 for each dispatched in- 
struction's result data. According to a the depicted illus- 

5 trative embodiment, instructions dispatched by dispatch 
unit 20 are also passed to a completion buffer within 
completion unit 40. Processor 1 0 tracks the program or- 
der of the dispatched instructions during out-of-order ex- 
ecution utilizing unique instruction identifiers. 

10 During the execute stage, execution units 22, 28, 
and 30 execute sequential instructions received from 
dispatch unit 20 opportunistically as operands and exe- 
cution resources for the indicated operations become 
available. Each of execution units 22, 28, and 30 are 

15 preferably equipped with a reservation station that 
stores instructions dispatched to that execution unit until 
operands or execution resources become available, Af- 
ter execution of an instruction has terminated, execution 
units 22, 28, and 30 store data results of the instruction 

^0 within either GPR rename buffers 33 or FPR rename 
buffers 37, depending upon the instruction type. Then, 
execution units 22, 28, and 30 notify completion unit 40 
which instructions stored within the completion buffer of 
completion unit 40 have finished execution. Finally, in- 

25 structions are completed by completion unit 40 in pro- 
gram order by transfei'ring data results of the instruc- 
tions from GPR rename buffers 33 and FPR rename 
buffers 37 to GPRs 32 and FPRs 36, respectively. 
Referring now to Figures 2 and 5, there are depict- 

30 ed a more detailed block diagram representation of BPU 
18 of processor 10 and a flowchart detailing the execu- 
tion of a branch instruction within BPU 18. With refer- 
ence first to Figure 5, the process begins at block 200 
and thereafter proceeds to blocks 202 and 204. Blocks 

05 202 and 204 depict sequential fetcher 17 retrieving the 
next set of sequential instructions from instruction cache 
14 and sending the fetched instructions to BPU 18 and 
instruction queue 19. As illustrated within Figure 2, BPU 
18 receives up to two instructions each cycle from se- 

-^0 quential fetcher 17 and stores the instructions within In- 
struction registers (IR) 50 and 52. Storing the fetched 
instructions within instruction registers 50 and 52 trig- 
gers decoding of the instructions by branch decoders 
54, as illustrated at block 206 of Figure 5. 

-^5 still referring to Figure 5, in response to a determi- 
nation at block 208 that an instruction stored within one 
of instruction registers 50 and 52 is a non-branch in- 
struction, the instruction is simply discarded, as depict- 
ed at block 210. Thereafter, the process passes from 

50 block 21 0 to block 270, where the processing of the dis- 
carded instruction terminates. If, on the other hand, a 
determination is made at block 208 that an instruction 
stored within one of instruction registers 50 and 52 is a 
branch instruction, a further determination is made at 

55 block 220 whether or not the instruction is a conditional 
branch instruction. Unconditional branch instructions 
are simply passed to branch select unit 56, which com- 
putes the effective address (EA) of the target instruction. 
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As illustrated at block 222, branch select unit 56 then 
transmits the EA to instruction cache 14 to initiate the 
fetching of sequential instructions at the target address. 

Returning to block 220, if a determination is made 
by one of branch decode units 54 that a fetched instruc- 
tion is a conditional branch instruction, the process pro- 
ceeds to block 230, which depicts a determination of 
whether or not the condition register (CR) field upon 
which the branch depends is the target of an instruction 
in progress, that is, an instruction which has been ^0 
fetched, but has not completed. In order to make the 
determination illustrated at block 230, the instruction is 
passed to branch select unit 56, which sends an indica- 
tion ol which CR field the branch is dependant upon to 
search logic 58 and 60. Utilizing the CR field identifier '5 
received from branch select unit 56, search logic 58 ex- 
amines the instructions within instruction queue 19 to 
determine if the CR field upon which the branch de- 
pends is a target ot one or more instructions within in- 
struction queue 19. A concurrent determination is made 
by search logic 60 if the CR field of interest is a target 
of a dispatched, but uncompleted instruction contained 
within completion buffer 62 of completion unit 40. If the 
CR field of interest is not the target of any instruction in 
progress, the branch instruction has already been re- 25 
solved and the CR field upon which the branch depends 
is present within an unillustrated CR special purpose 
register (SPR) that stores a historic state of the CR. Ac- 
cordingly, the process proceeds from block 230 to block 
232, which depicts branch decision unit 64 examining 30 
the CR field of interest within the CR SPR and resolving 
the conditional branch, If possible. The resolution of the 
branch is then supplied to branch select unit 56. There- 
after, as illustrated at block 222, branch select unit 56 
computes the EA of the target instruction and transmits 05 
the EA to instruction cache 14. 

Returning now to block 230, if a determination is 
made by search logic 58 or 60 that the CR field of inter- 
est is the target of an instruction in progress, the process 
proceeds to block 240, which illustrates a determination 
by search logic 60 whether or not the CR bit supplied by 
the instruction is available. The CR bit supplied by an 
instruction is available if the instruction has finished 
completion and the CR bit generated by the execution 
of the instruction is stored within completion buffer 62. -^5 
If the CR bit is available, the process proceeds from 
block 240 to block 242, which depicts branch decision 
unit 64 examining the CR bit associated with the instruc- 
tion to resolve the branch. In addition, branch decision 
unit 64 supplies the resolution of the conditional branch so 
instruction to branch select 56, which as illustrated at 
block 222, computes the EA of the next instructions to 
be fetched and forwards the EA to instruction cache 1 4. 

^Reierxinjg^aga^ not yet 

avaHabIe4Le^^-^^ has not been dispatched 55 

or has not yet finished exeGution),-the~prQcess-then ^ 

passes4o-blocks-25.Q;26.0_w Jnich,depicUI:ie-spectilativ e 
execution~otan--unresolv^2e,onditionalzb,ranchziRStruc- 



tion. Thus, branch decision unit 64 notifies selection log- 
ic 66 that the CR bit is not available, indicating that the 
instruction is to be executed speculatively by prediction. 
N ext <^as41lustra^^ 

-lectSMa-bfaneh: £rg.dictidnj!netfTodology> According to an 
important aspect of the illustrative embodiment and as 
<.d^sp,nib.e.d:i n:d_.etai twittxrref.'eTenceito^ 
^e!ecii.oo:logtc:66 sej ects-a-branch- giMictionrmet^iocl^ 
rOlogV-^ based' upon3h'eIoutcomes-Qflpast-predictlQn^ 
which are stored within branch history table (BHT) 68. 
Thus, in contrast to prior art processors which support 
multiple branch prediction modes, the illustrative em- 
bodiment depicted in Figure 1 intelligently and dynam- 
ically selects a best branch prediction methodology 
based upon the outcomes of prior predictions. Utilizing 
the selected branch prediction methodology, branch de- 
cision unit 64 predicts the branch as taken or not taken 
and indicates the prediction to branch select unit 56, as 
depicted at block 252. Thereafter, as illustrated at block 
254, branch select unit 56 calculates and transmits the 
EA of the target instruction to instruction cache 14. 

The process illustrated in Figure 5 proceeds from 
block 254 to block 256, which depicts a determination 
of whether or not the conditional branch instruction re- 
solved as correctly predicted. The determination depict- 
ed at block 256 is made by branch decision unit 64, 
which receives the CR bit upon which the branch de- 
pends from search logic 60 following the finish of the 
associated instruction. If a determination is made by 
branch decision unit 64 that the branch was mispredict- 
ed, the process passes to block 258, which depicts 
branch decision unit 64 cancelling instructions within the 
speculative execution path of the mispredicted branch 
Instruction from instruction queue 19 and execution 
units 22, 28, and 30. The process then proceeds to block 
260. Those skilled in the art will appreciate that for proc- 
essor Implementations which permit more then one lev- 
el of speculation, all speculative instructions need not 
be cancelled, but only those which were fetched in re- 
sponse to misprediction of the branch. Returning to 
block 256, the process proceeds from block 256 to block 
260 in response to a determination at block 256 that the 
branch was correctly predicted. Block 260 depicts se- 
lection logic 66 updating BHT 68, if necessary, to ensure 
that the appropriate branch prediction methodology is 
selected for subsequent speculative executions of the 
conditional branch instruction. Thereafter, the process 
terminates at block 270. 

With reference now to Figure 3. there is illustrated 
a pictorial representation of branch history table (BHT) 
68 of BPU 18. As depicted, BHT 68 comprises a table 
including 256 entries 80, which are each accessed uti- 
lizing an index 82. In the depicted embodiment, each 
index 82 comprises the eight least significant bits of a 
branch instruction address. Thus, for example, the 
branch history of instructions having an address ending 
with OOh are stored within the first entry 80, branch in- 
structions having addresses ending with 01 h are stored 
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in the second entry, and so on. As depicted, each entry 
80 comprises three bits, which together specify one of 
seven prediction states (an eighth possible state is un- 
used) for a subsequent conditional branch instruction, 
As with conventional branch history tables, a prediction 
state stored within an entry 80 of BHT 68 is updated fol- 
lowing the resolution of the associated branch as taken 
or not taken in order to more accurately predict a sub- 
sequent execution of that branch instruction. 

Referring now to Figure 4, there is depicted an il- 
lustrative embodiment of a state machine utilized to se- 
lect a branch prediction methodology. State machine 88 
comprises seven states, including four dynamic predic- 
lion states 90-96 and three static prediction stales 
98-102. As indicated, each of prediction states 90-102 
corresponds to one of the seven possible settings of 
each entry 80 within BHT 68. 

Referring first to static prediction state 100, if a 
branch instruction to be predicted maps to an entry 80 
of BHT 68 that is set to "000," static branch prediction 
is utilized to predict whether or not the branch should be 
taken. If the branch is resolved as taken, entry 80 within 
BHT 68 is updated to "001," as illustrated at static pre- 
diction state 102. It is important to note that state ma- 
chine 88 proceeds from static prediction state 100 to 
static prediction state 102 in response to resolution of 
the branch as taken regardless of whether or not the 
branch was correctly predicted. When the conditional 
branch instruction is next predicted, static branch pre- 
diction is again utilized as specified by static prediction 
state 102. II the branch is again resolved as taken, entry 
80 within BHT 68 is updated to "01 0" to indicate that 
dynamic branch prediction is to be utilized fora next pre- 
diction, as illustrated by dynamic prediction state 90. 

Thereafter, state machine 88 remains at dynamic 
prediction state 90 as long as subsequent branch pre- 
dictions are resolved as taken. However, in response to 
the resolution of a branch as not taken, state machine 
88 proceeds from dynamic prediction state 90 to dynam- 
ic prediction state 92 and entry 80 within BHT 68 is up- 
dated to "Oil." When entry 80 set to "01 V, which corre- 
sponds to dynamic prediction state 92, the branch in- 
struction which maps to entry 80 will again be predicted 
as taken. In response to a resolution of the branch as 
taken (i.e., correctly predicted), slate machine 88 re- 
turns to dynamic prediction state 90, which has been 
described. However, in response to a resolution of the 
branch as not taken, the state machine 88 returns from 
dynamic prediction state 92 to static prediction state 
100 

Referring again to static prediction state 100, if the 
branch instruction is resolved as not taken, the state ma- 
chine 88 passes from static prediction state 100 to static 
prediction state 98, at which entry 30 is updated to "1 11 
If a next occurrence of the branch instruction is resolved 
as taken, state machine 88 returns from static prediction 
state 98 to static prediction state 100, which has been 
described. On the other hand, if the branch instruction 



is resolved as not taken, state machine 88 proceeds 
from static prediction state 98 to dynamic prediction 
state 96, at which entry 80 is updated to the "110." 
Conditional branches are predicted as not taken 

5 while state machine 88 is at dynamic branch prediction 
state 96 and entry 80 is correspondingly set to "110." If 
a predicted conditional branch is resolved as not taken, 
state machine 88 remains at dynamic prediction state 
96. Alternatively, if a predicted conditional branch is re- 

10 solved as taken, state machine 88 proceeds to dynamic 
prediction state 94, which corresponds to the BHT set- 
ting "101." Again, conditional branches are predicted as 
not taken when state machine 88 is at dynamic predic- 
tion state 94. II a branch predicted as not taken at dy- 

'5 namic prediction state 94 is resolved as not taken, state 
machine 88 returns to dynamic prediction state 96, 
which has been described. However, if the branch is re- 
solved as taken, the process returns to static prediction 
state 100, which has also been described. 

20 As can be seen from the foregoing description of 
Figure 4, while state machine 88 is in one of static pre- 
diction states 98-102, the direction of branch resolution 
determines the next stale of the prediction mechanism. 
After a branch is resolved as taken or not taken two con- 

25 secutive times, a dynamic branch prediction methodol- 
ogy is utilized. Similarly if state machine 88 is in one of 
dynamic prediction states 90 or 96, branches are pre- 
dicted utilizing static branch prediction following two 
consecutive branch mispredictions. Although Figure 4 

^0 depicts an illustrative embodiment of state machine 88, 
those skilled in the art will appreciate that other state 
machines can be implemented which enable a branch 
prediction methodology to be intelligently selected 
based upon branch history. Furthermore, those skilled 

05 in the art will appreciate that multiple types of static and 
dynamic prediction may be utilized. For example, BPU 
18 can implement a simple static branch prediction 
scheme that predicts the resolution of a branch instruc- 
tion based upon whether the indicated branch is a for- 

40 ward or backward branch; alternatively, more complex 
static branch prediction schemes can be employed 
which predict branch instructions based upon informa- 
tion learned during program compilation. 

While an illustrative embodiment has been particu- 
larly shown and described, it will be understood by those 
skilled in the art that various changes in form and detail 
may be made therein without departing from the spirit 
and scope of the illustrative embodiment. 

50 

Claims 

1. A processor, comprising: 

one or more execution units for executing in- 
55 structions, said one or more execution units includ- 
ing a branch processing unit for executing branch' 
instructions, said branch processing unit including: 
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selection logic for selecting one of a plurality of 
branch prediction nnethodologies; 
a branch prediction unit for predicting a resolu- 
tion of a conditional branch instruction utilizing* 
said selected branch prediction methodology; 5 
and 

means for speculatively executing said branch 
instruction according to said prediction. 

2. A processor according to Claim 1 , wherein said se- io 
lection logic is adapted to select a branch prediction 
methodology tor predicting a resolution of a subse- 
quent conditional branch instruction in response to 

an outcome of said prediction. 

15 

3. A processor according to Claim 1 or Claim 2, where- 
in said plurality of branch prediction methodologies 
includes dynamic branch prediction. 

4. A processor according to Claim 3^ further compris- 
ing: 

a branch history table, wherein said branch 
history table stores an indication of whether said 
conditional branch Instruction should be predicted 
as taken or not taken if dynamic branch prediction 25 
is utilized. 

5. A processor according to Claim 3 or Claim 4, where- 
in said plurality of branch prediction methodologies 
includes static branch prediction. 30 

6. A processor according to any one of Claims 3 to 5, 
wherein said selection logic comprises: 



7. A processor according to any one of the preceding 
Claims, wherein said processor has an associated -^5 
memory for storing instructions, said processor fur- 
ther comprising a fetcher, wherein said letcher 
letches from memory one or more instructions for 
execution within a speculative execution path indi- 
cated by said prediction. so 

8. The processor of Claim 7, said processor further 
comprising: 

means for cancelling said instructions within 
said speculative execution path in response to a 55 
resolution of said conditional branch instruction as 
mispredicted. 



9. A data processing system, comprising: 

a bus; 

a memory coupled to said bus, wherein said 
memory stores instructions to be executed; 
a fetcher for fetching instructions from said 
memory, said fetcher being coupled to said 
memory via said bus; 

one or more execution units for executing 
fetched instructions, said one or more execu- 
tion units including a branch processing unit for 
executing branch instructions, said branch 
processing unit including: 

selection logic for selecting one of a plural- 
ity of branch prediction methodologies; 
a branch prediction unit for predicting a res- 
olution of a conditional branch instruction 
utilizing said selected branch prediction 
methodology; and 

means for speculatively executing said 
branch instruction according to said predic- 
tion. 

10. A data processing system according to Claim 9 
wherein, based upon an outcome of said prediction, 
said selection logic selects a branch prediction 
methodology for predicting a resolution of a subse- 
quent conditional branch instruction, such that 
branch prediction accuracy is enhanced. 

11. A method within a processor for speculatively exe- 
cuting a conditional branch instruction, said method 
comprising: 

selecting one of a plurality of branch prediction 
methodologies to predict a resolution of a con- 
ditional branch instruction; 
predicting a resolution of said conditional 
branch instruction as taken or not taken utilizing 
said selected branch prediction methodology; 
and 

speculatively executing said conditional branch 
instruction according to said prediction. 

12. A method according to Claim 11 including the sub- 
sequent steps of resolving said conditional branch 
instruction as taken or not taken; and 

in response to said resolution of said branch 
instruction, utilizing an outcome of said prediction 
to select a branch prediction methodology for pre- 
dicting a resolution of a subsequent conditional 
.branch instruction, wherein branch prediction accu- 
racy is enhanced. 

1 3. A method according to Claim 11 or Claim 1 2, where- 
in said plurality of branch prediction methodologies 
includes dynamic branch prediction. 



means, responsive to a first selected number 05 
of incorrect predictions of conditional branch in- 
structions utilizing dynamic branch prediction, 
for selecting static branch prediction; and 
means, responsive to a resolution of a second 
selected number of statically predicted condi- 
tional branch instructions as ail taken or not tak- 
en, for selecting dynamic branch prediction. 
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14. A method according to Claim 13, further compris- 
ing: 

storing an indication of whether or not said 
subsequent conditional branch instruction should 
be predicted as taken or not taken utilizing dynamic 5 
branch prediction. 

15. A method according to Claim 1 3 or Claim 14, where- 
in said plurality of branch prediction methodologies 
includes static branch prediction. io 

16. A method according to Claim 15, wherein said step 
of selecting one of a plurality of branch prediction 
methodologies comprises: 

15 

in response to a first selected number of Incor- 
rect predictions of branch instructions utilizing 
dynamic branch prediction, selecting static 
branch prediction; and 

in response to a resolution of a second selected 
number of statically predicted branch instruc- 
tions as all taken or not taken, selecting dynam- 
ic branch prediction. 
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